Skip to content

Conversation

codeflash-ai[bot]
Copy link

@codeflash-ai codeflash-ai bot commented Oct 22, 2025

📄 24% (0.24x) speedup for Serializeable.from_dict in guardrails/classes/generic/serializeable.py

⏱️ Runtime : 5.98 milliseconds 4.82 milliseconds (best of 125 runs)

📝 Explanation and details

The optimized code achieves a 23% speedup through three key algorithmic improvements:

1. Set-based lookup optimization: Changed attributes = dict.keys(annotations) to attributes = set(annotations). This converts O(n) list membership checks (snake_case(k) in attributes) to O(1) set lookups, which is critical when processing many keys.

2. Eliminated redundant snake_case computations: The original code called snake_case(k) twice per key - once for the membership check and once as the dictionary key. The optimized version precomputes all snake_case transformations in a single pass: sc_data = {snake_case(k): v for k, v in data.items()}, then filters with simple set membership.

3. Efficient encoder default handling: Replaced the get/assign pattern with setdefault("encoder", SerializeableJSONEncoder), avoiding the overhead of checking if the key exists before assignment.

Performance characteristics by test case:

  • Small datasets (1-5 fields): 8-25% speedup from reduced function call overhead
  • Large datasets (100+ fields): 77-78% speedup where the O(1) set lookups and single snake_case pass provide dramatic benefits
  • High noise datasets (many irrelevant keys): Moderate 5% speedup as the algorithm still processes all input keys but benefits from faster filtering

The optimization scales particularly well with input size, making it most valuable for applications processing large dictionaries or high-frequency serialization workloads.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 3 Passed
🌀 Generated Regression Tests 27 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 87.5%
⚙️ Existing Unit Tests and Runtime
Test File::Test Function Original ⏱️ Optimized ⏱️ Speedup
unit_tests/cli/server/test_hub_client.py::test_get_jwt_token 55.2μs 51.9μs 6.45%✅
🌀 Generated Regression Tests and Runtime
import inspect
import sys
from dataclasses import InitVar, asdict, dataclass, field
from json import JSONEncoder
from typing import Any, Dict

# imports
import pytest  # used for our unit tests
from guardrails.classes.generic.serializeable import Serializeable


# Minimal snake_case implementation for testing
def snake_case(s):
    # Converts camelCase or PascalCase to snake_case
    import re
    s = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', s)
    s = re.sub('([a-z0-9])([A-Z])', r'\1_\2', s)
    return s.lower()

# Minimal JSONEncoder for testing
class SerializeableJSONEncoder(JSONEncoder):
    pass

encoder_kwargs = {}
if sys.version_info.minor >= 10:
    encoder_kwargs["kw_only"] = True
    encoder_kwargs["default"] = SerializeableJSONEncoder
from guardrails.classes.generic.serializeable import Serializeable

# --- Unit tests ---

# Basic test case: Simple dataclass with one field
@dataclass
class Simple(Serializeable):
    foo: int

def test_basic_single_field():
    # Normal case: field matches exactly
    codeflash_output = Simple.from_dict({'foo': 42}); obj = codeflash_output # 23.0μs -> 20.0μs (15.5% faster)

def test_basic_snake_case_conversion():
    # Key is camelCase, should convert to snake_case
    codeflash_output = Simple.from_dict({'Foo': 99}); obj = codeflash_output # 22.8μs -> 19.6μs (16.3% faster)

def test_basic_extra_fields_ignored():
    # Extra fields should be ignored
    codeflash_output = Simple.from_dict({'foo': 1, 'bar': 2}); obj = codeflash_output # 24.7μs -> 22.7μs (8.83% faster)

# Basic test: Multiple fields, mixed case
@dataclass
class Multi(Serializeable):
    first_name: str
    last_name: str
    age: int

def test_basic_multiple_fields():
    codeflash_output = Multi.from_dict({'firstName': 'John', 'lastName': 'Doe', 'age': 30}); obj = codeflash_output # 37.4μs -> 30.8μs (21.3% faster)

def test_basic_missing_field():
    # Missing field: should raise TypeError
    with pytest.raises(TypeError):
        Multi.from_dict({'firstName': 'John', 'age': 30}) # 34.4μs -> 28.6μs (20.3% faster)

# Edge case: Empty dict
def test_edge_empty_dict():
    # All fields missing: should raise TypeError
    with pytest.raises(TypeError):
        Simple.from_dict({}) # 11.9μs -> 11.6μs (2.66% faster)

# Edge case: None values
def test_edge_none_value():
    # None value should be accepted if type allows
    @dataclass
    class Nullable(Serializeable):
        foo: Any
    codeflash_output = Nullable.from_dict({'foo': None}); obj = codeflash_output # 21.8μs -> 19.5μs (11.4% faster)

# Edge case: Unusual key names

def test_edge_non_string_key():
    @dataclass
    class NumKey(Serializeable):
        foo: int
    # Key is integer, should be ignored
    codeflash_output = NumKey.from_dict({123: 456, 'foo': 789}); obj = codeflash_output # 25.5μs -> 23.8μs (7.07% faster)

# Edge case: Data is not a dict
def test_edge_non_dict_input():
    with pytest.raises(AttributeError):
        Simple.from_dict(['foo', 42]) # 21.6μs -> 7.89μs (174% faster)

# Edge case: Field is missing, but has a default
@dataclass
class DefaultField(Serializeable):
    foo: int = 10

def test_edge_field_with_default():
    codeflash_output = DefaultField.from_dict({}); obj = codeflash_output # 9.16μs -> 10.0μs (8.50% slower)

def test_edge_field_with_default_overridden():
    codeflash_output = DefaultField.from_dict({'foo': 99}); obj = codeflash_output # 23.2μs -> 21.1μs (10.0% faster)

# Edge case: encoder explicitly provided
def test_edge_encoder_override():
    custom_encoder = JSONEncoder()
    codeflash_output = Simple.from_dict({'foo': 7, 'encoder': custom_encoder}); obj = codeflash_output # 24.8μs -> 23.4μs (5.77% faster)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import inspect
import sys
from dataclasses import InitVar, asdict, dataclass, field
from json import JSONEncoder
from typing import Any, Dict

# imports
import pytest
from guardrails.classes.generic.serializeable import Serializeable


# Minimal stub for snake_case (since pydash is not available)
def snake_case(s):
    # Simple snake_case implementation for test purposes
    import re
    s = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', s)
    s = re.sub('([a-z0-9])([A-Z])', r'\1_\2', s)
    return s.lower()

# Minimal stub for SerializeableJSONEncoder
class SerializeableJSONEncoder(JSONEncoder):
    pass

encoder_kwargs = {}
if sys.version_info.minor >= 10:
    encoder_kwargs["kw_only"] = True
    encoder_kwargs["default"] = SerializeableJSONEncoder
from guardrails.classes.generic.serializeable import Serializeable

# --- UNIT TESTS ---

# Helper: Example subclass for basic and edge tests
@dataclass
class Person(Serializeable):
    first_name: str
    last_name: str
    age: int

# Helper: Subclass with more fields for large scale
@dataclass
class LargeDataClass(Serializeable):
    # 100 fields
    field_0: int; field_1: int; field_2: int; field_3: int; field_4: int
    field_5: int; field_6: int; field_7: int; field_8: int; field_9: int
    field_10: int; field_11: int; field_12: int; field_13: int; field_14: int
    field_15: int; field_16: int; field_17: int; field_18: int; field_19: int
    field_20: int; field_21: int; field_22: int; field_23: int; field_24: int
    field_25: int; field_26: int; field_27: int; field_28: int; field_29: int
    field_30: int; field_31: int; field_32: int; field_33: int; field_34: int
    field_35: int; field_36: int; field_37: int; field_38: int; field_39: int
    field_40: int; field_41: int; field_42: int; field_43: int; field_44: int
    field_45: int; field_46: int; field_47: int; field_48: int; field_49: int
    field_50: int; field_51: int; field_52: int; field_53: int; field_54: int
    field_55: int; field_56: int; field_57: int; field_58: int; field_59: int
    field_60: int; field_61: int; field_62: int; field_63: int; field_64: int
    field_65: int; field_66: int; field_67: int; field_68: int; field_69: int
    field_70: int; field_71: int; field_72: int; field_73: int; field_74: int
    field_75: int; field_76: int; field_77: int; field_78: int; field_79: int
    field_80: int; field_81: int; field_82: int; field_83: int; field_84: int
    field_85: int; field_86: int; field_87: int; field_88: int; field_89: int
    field_90: int; field_91: int; field_92: int; field_93: int; field_94: int
    field_95: int; field_96: int; field_97: int; field_98: int; field_99: int

# -------- BASIC TEST CASES --------

def test_from_dict_basic_fields():
    # Test that from_dict correctly constructs an object with all fields present
    data = {"first_name": "John", "last_name": "Doe", "age": 30}
    codeflash_output = Person.from_dict(data); p = codeflash_output # 38.4μs -> 32.5μs (18.1% faster)

def test_from_dict_snake_case_conversion():
    # Test that from_dict works with camelCase keys by converting them to snake_case
    data = {"firstName": "Alice", "lastName": "Smith", "age": 25}
    codeflash_output = Person.from_dict(data); p = codeflash_output # 35.9μs -> 28.8μs (24.6% faster)

def test_from_dict_extra_keys_ignored():
    # Test that extra keys in the input dict are ignored
    data = {"first_name": "Jane", "last_name": "Roe", "age": 40, "irrelevant": "foo"}
    codeflash_output = Person.from_dict(data); p = codeflash_output # 40.4μs -> 34.5μs (17.1% faster)

def test_from_dict_encoder_default():
    # Test that encoder is set to SerializeableJSONEncoder by default
    data = {"first_name": "Sam", "last_name": "Lee", "age": 50}
    codeflash_output = Person.from_dict(data); p = codeflash_output # 38.9μs -> 31.7μs (22.5% faster)

def test_from_dict_encoder_override():
    # Test that encoder can be overridden if present in input
    class DummyEncoder(JSONEncoder): pass
    data = {"first_name": "Sam", "last_name": "Lee", "age": 50, "encoder": DummyEncoder}
    codeflash_output = Person.from_dict(data); p = codeflash_output # 40.2μs -> 34.9μs (15.1% faster)

# -------- EDGE TEST CASES --------

def test_from_dict_missing_field_raises():
    # Test that missing required fields raises a TypeError
    data = {"first_name": "Jane", "age": 22}  # missing last_name
    with pytest.raises(TypeError):
        Person.from_dict(data) # 36.6μs -> 30.1μs (21.4% faster)

def test_from_dict_empty_dict():
    # Test that empty dict raises TypeError (all required fields missing)
    with pytest.raises(TypeError):
        Person.from_dict({}) # 13.3μs -> 13.1μs (1.23% faster)

def test_from_dict_field_with_none_value():
    # Test that None is accepted as a value (for fields that allow it)
    # Redefine Person with optional field
    from dataclasses import dataclass
    from typing import Optional
    @dataclass
    class PersonOpt(Serializeable):
        first_name: str
        last_name: str
        age: Optional[int]
    data = {"first_name": "Bob", "last_name": "Brown", "age": None}
    codeflash_output = PersonOpt.from_dict(data); p = codeflash_output # 38.1μs -> 31.1μs (22.6% faster)

def test_from_dict_unexpected_key_with_similar_name():
    # Test that keys that almost match (but don't) are ignored
    data = {"first_names": "Eve", "last_name": "Adams", "age": 44}
    with pytest.raises(TypeError):  # missing required 'first_name'
        Person.from_dict(data) # 39.6μs -> 34.6μs (14.6% faster)

def test_from_dict_with_int_keys():
    # Test that integer keys are ignored (since they can't be snake_cased)
    data = {1: "Alice", "first_name": "Bob", "last_name": "Smith", "age": 20}
    codeflash_output = Person.from_dict(data); p = codeflash_output # 43.1μs -> 35.1μs (22.5% faster)

def test_from_dict_with_non_str_field_values():
    # Test that fields with various types (e.g., list, dict) are passed through
    @dataclass
    class ComplexPerson(Serializeable):
        first_name: str
        last_name: str
        age: int
        tags: list
        meta: dict
    data = {
        "first_name": "Zoe",
        "last_name": "Zebra",
        "age": 28,
        "tags": ["vip", "premium"],
        "meta": {"score": 100}
    }
    codeflash_output = ComplexPerson.from_dict(data); p = codeflash_output # 46.2μs -> 36.7μs (25.8% faster)

# -------- LARGE SCALE TEST CASES --------

def test_from_dict_large_number_of_fields():
    # Test that from_dict works with a dataclass with many fields
    data = {f"field_{i}": i for i in range(100)}
    codeflash_output = LargeDataClass.from_dict(data); obj = codeflash_output # 668μs -> 377μs (77.1% faster)
    for i in range(100):
        pass

def test_from_dict_large_input_dict_with_extra_keys():
    # Test that from_dict ignores extra keys in a large input dict
    data = {f"field_{i}": i for i in range(100)}
    # Add 900 extra irrelevant keys
    data.update({f"junk_{i}": "x" for i in range(900)})
    codeflash_output = LargeDataClass.from_dict(data); obj = codeflash_output # 3.19ms -> 3.04ms (4.95% faster)
    for i in range(100):
        pass
    # None of the junk keys should be in the object's __dict__
    for i in range(900):
        pass

def test_from_dict_large_scale_snake_case():
    # Test that from_dict handles camelCase keys for large dataclass
    data = {f"field{i}": i for i in range(100)}  # camelCase keys
    codeflash_output = LargeDataClass.from_dict(data); obj = codeflash_output # 708μs -> 398μs (78.0% faster)
    for i in range(100):
        pass

def test_from_dict_performance_large():
    # Performance: from_dict should not be unreasonably slow (basic sanity check)
    import time
    data = {f"field_{i}": i for i in range(100)}
    start = time.time()
    codeflash_output = LargeDataClass.from_dict(data); obj = codeflash_output # 658μs -> 370μs (77.7% faster)
    elapsed = time.time() - start
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-Serializeable.from_dict-mh2l1mq6 and push.

Codeflash

The optimized code achieves a 23% speedup through three key algorithmic improvements:

**1. Set-based lookup optimization**: Changed `attributes = dict.keys(annotations)` to `attributes = set(annotations)`. This converts O(n) list membership checks (`snake_case(k) in attributes`) to O(1) set lookups, which is critical when processing many keys.

**2. Eliminated redundant snake_case computations**: The original code called `snake_case(k)` twice per key - once for the membership check and once as the dictionary key. The optimized version precomputes all snake_case transformations in a single pass: `sc_data = {snake_case(k): v for k, v in data.items()}`, then filters with simple set membership.

**3. Efficient encoder default handling**: Replaced the get/assign pattern with `setdefault("encoder", SerializeableJSONEncoder)`, avoiding the overhead of checking if the key exists before assignment.

**Performance characteristics by test case**:
- **Small datasets** (1-5 fields): 8-25% speedup from reduced function call overhead
- **Large datasets** (100+ fields): 77-78% speedup where the O(1) set lookups and single snake_case pass provide dramatic benefits
- **High noise datasets** (many irrelevant keys): Moderate 5% speedup as the algorithm still processes all input keys but benefits from faster filtering

The optimization scales particularly well with input size, making it most valuable for applications processing large dictionaries or high-frequency serialization workloads.
@codeflash-ai codeflash-ai bot requested a review from mashraf-222 October 22, 2025 22:46
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Oct 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants